#Reinforcement Learning

7 articles

TechJul 7, 20267 min

One trained transformer layer matches full-parameter RL across 7 Qwen models

RL gains sit at 40-60% depth: on Qwen3-8B, training only layer 16 beats full-parameter RL (67.1 vs 66.5). Notes on arXiv 2607.01232 and what it doesn't claim about efficiency.

AI LLM Reinforcement Learning Qwen

TechApr 1, 20269 min

TRL v1.0 is a major release that gives LLM post-training a stable foundation

Hugging Face's LLM post-training library TRL has reached v1.0. Stable/Experimental tiers, the stabilization of GRPO/DPO/SFT, and a roadmap that includes asynchronous GRPO all point to a more mature stack.

AI Machine Learning Reinforcement Learning LLM Open Source

TechMar 27, 20268 min

Chroma Context-1 achieves search performance equivalent to Frontier LLM with 20B parameters

A self-editing search agent with 20B parameters published by Chroma. It performs multi-hop search while dynamically pruning the context, and shows the same or higher accuracy than the Frontier model at 1/10 the cost and up to 10 times faster latency. Weights are exposed in Apache 2.0.

Chroma search agent Reinforcement Learning RAG LLM

TechMar 21, 20264 min

Cursor Composer 2 turned out to be Kimi K2.5 with coding-focused RL

Cursor released Composer 2 without disclosing its base model; calling its OpenAI-compatible API revealed it is Kimi K2.5. This escalated into a licensing dispute, but a formal commercial agreement with Moonshot AI was subsequently confirmed.

Cursor Kimi Moonshot AI Reinforcement Learning LLM open weights

TechMar 11, 20267 min

Design patterns for LLM asynchronous training seen in 16 open source RL libraries

HuggingFace conducts a comparative analysis of 16 open source RL training libraries based on 7 design axes. In the synchronous type, the GPU utilization remains at around 60% due to the generation bottleneck, but with an asynchronous separation design it can be improved to over 95%.

A.I.Machine Learning Reinforcement Learning LLM

TechFeb 3, 20262 min

Agent Lightning: Microsoft's reinforcement learning framework for AI agents

Microsoft released an open-source framework that can optimize almost any AI agent with reinforcement learning, with little to no code changes. It supports arbitrary frameworks such as LangChain, AutoGen, and Claude Agent SDK.

AI AI Agent Reinforcement Learning Python Microsoft

TechFeb 2, 20264 min

Power Sampling: unlocking LLM reasoning without reinforcement learning

A look at how changing the inference-time sampling strategy can improve LLM reasoning performance without retraining on RL.

LLM Inference Reinforcement Learning Sampling AI